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Abstract 

We develop dynamic dictionaries on the word RAM that use asymptotically optimal space, up 
to constant factors, subject to insertions and deletions, and subject to supporting perfect-hashing 
queries and/or membership queries, each operation in constant time with high probability. 
When supporting only membership queries, we attain the optimal space bound of @(nlg^) 
bits, where n and u are the sizes of the dictionary and the universe, respectively. Previous 
dictionaries either did not achieve this space bound or had time bounds that were only expected 
and amortized. When supporting perfect-hashing queries, the optimal space bound depends on 
the range {1,2, ..., n + t} of hashcodes allowed as output. We prove that the optimal space 
bound is 0(nlglg^ + n lgypy) bits when supporting only perfect-hashing queries, and it is 
0(nlg ^ +n lg j/y) bits when also supporting membership queries. All upper bounds are new, 
as is the f2(nlg -^) lower bound. 


1 Introduction 

The dictionary is one of the most fundamental data-structural problems in computer science. In 
its basic form, a dictionary allows some form of “lookup” on a set S of n objects, and in a dy¬ 
namic dictionary, elements can be inserted into and deleted from the set S. However, being such 
a well-studied problem, there are many variations in the details of what exactly is required of a 
dictionary, particularly the lookup operation, and these variations greatly affect the best possible 
data structures. To enable a systematic study, we introduce a unified view consisting of three pos¬ 
sible types of queries that, in various combinations, capture the most common types of dictionaries 
considered in the literature: 

Membership: Given an element x, is it in the set S'? 

Retrieval: Given an element x in the set S, retrieve r bits of data associated with x. (The outcome 
is undefined if x is not in S.) The associated data can be set upon insertion or with another 
update operation. We state constant time bounds for these operations, which ignore the @(r) 
divided by word size required to read or write r bits of data. 

Perfect hashing: Given an element x in the set S, return the hashcode of x. The data structure 
assigns to each element i in 5 a unique hashcode in [n + t], 1 for a specified parameter t (e.g., 

1 The notation [fc] represents the set {0,1, — 1}. 
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t = 0 or t = n). Hashcodes are stable: the hashcode of x must remain fixed for the duration 
that x is in S. (Again the outcome is undefined if x is not in S.) 

Standard hash tables generally support membership and retrieval. Some hash tables with 
open addressing (no chaining) also support perfect hashing, but the expected running time is 
superconstant unless t = f2(n). However, standard hash tables are not particularly space efficient 
if n is close to u: they use 0(n) words, which is O(nlgu) bits for a universe of size u. whereas only 
log 2 (“) = 0(nlg^) bits (assuming n < u/ 2) are required to represent the set S. 2 

Any dictionary supporting membership needs at least log 2 (“) bits of space. But while such 
dictionaries are versatile, they are large, and membership is not always required. For example, 
Chazelle et al. f3; explore the idea of a static dictionary supporting only retrieval, with several 
applications related to Bloom filters. For other data-structural problems, such as range reporting 
in one dimension mm, the only known way to get optimal space bounds is to use a dictionary 
that supports retrieval but not membership. The retrieval operation requires storing the r-bit data 
associated with each element, for a total of at least rn bits. If r is asymptotically less than lg^, 
then we would like to avoid actually representing the set S. However, as we shall see, we still need 
more than rn bits even in a retrieval-only dictionary. 

Perfect hashing is stronger than retrieval, up to constant factors in space, because we can 
simply store an array mapping hashcodes to the r-bit data for each element. Therefore we focus on 
developing dictionaries supporting perfect hashing, and obtain retrieval for free. Conversely, lower 
bounds on retrieval apply to perfect hashing as well. Because hashcodes are stable, this approach 
has the additional property that the associated data never moves, which can be useful, e.g. when 
the data is large or is stored on disk. 

Despite substantial work on dictionaries and perfect hashing (see Section Ol) . no dynamic 
dictionary data structure supporting any of the three types of queries simultaneously achieves 
(1) constant time bounds with high probability and (2) compactness in the sense that the space is 
within a constant factor of optimal. 

1.1 Our results 

We characterize the optimal space bound, up to constant factors, for a dynamic dictionary support¬ 
ing any subset of the three operations, designing data structures to achieve these bounds and in 
some cases improving the lower bound. To set our results in context, we first state the two known 
lower bounds on the space required by a dictionary data structure. First, as mentioned above, 
any dictionary supporting membership (even static) requires H(nlg ^) bits of space, assuming that 
n < u/2. Second, any dictionary supporting retrieval must satisfy the following recent and strictly 
weaker lower bound: 

Theorem 1. m Any dynamic dictionary supporting retrieval (and therefore any dynamic dictio¬ 
nary supporting perfect hashing) requires H(nlglg^) bits of space in expectation, even when the 
associated data is just r = 1 bit. 

Surprisingly, for dynamic dictionaries supporting perfect hashing, this lower bound is neither 
tight nor subsumed by a stronger lower bound. In Sectional we prove our main lower-bound result, 
which complements Theorem ^ depending on the value of t: 

throughout this paper, lgx denotes log 2 (2 + x ), which is positive for all x > 0. 


2 



Dictionary queries supported 


Optimal space 



Reference 

retrieval 

0(nlglg 

11 

+ nr) 

U 

in 

m 

n 

in 

m 

retrieval + perfect hashing 

0(n 

lglg 

u 

n 

+ n lg tTT + nr ) 

O 

in 

El 

n 

in 

§3 and IT) 

membership 

0(n 

1§ n 


o 

in 

El 

n 

is 

standard 

membership + retrieval 

0(n 

IS n 

+ 

nr) 

o 

in 

El 

n 

is 

standard 

membership + retrieval + perfect hashing 

0(n 

is^ 

+ 

n lg tTT + nr ) 

o 

in 

El 

n 

in 

El 


Table 1: Optimal space bounds for all types of dynamic dictionaries supporting operations in 
constant time with high probability. The upper bounds supporting retrieval without perfect hashing 
can be obtained by substituting t = n. The 0(nlg bounds assume n < u/ 2; more precisely, they 
are 0(log 2 (“)). 


Theorem 2. Any dynamic dictionary supporting perfect hashing with hashcodes in [n + t] must 
use fi(nlg^-j-) bits of space in expectation, regardless of the query and update times, assuming that 
u > n + (1 + e)t for some constant e > 0. 

Our main upper-bound result is a dynamic dictionary supporting perfect hashing that matches 
the sum of the two lower bounds given by Theorems |T| and El Specifically, Section El proves the 
following theorem: 

Theorem 3. There is a dynamic dictionary that supports updates and perfect hashing with hash- 
codes in [n+t] (and therefore also retrieval queries) in constant time per operation, using 0(n lglg ^+ 
bits of space. The query and space complexities are worst-case, while updates are handled 
in constant time with high probability. 

To establish this upper bound, we find it necessary to also obtain optimal results for dynamic 
dictionaries supporting both membership and perfect hashing. In Sectional we find that the best 
possible space bound is a sum of two lower bounds in this case as well: 

Theorem 4. There is a dynamic dictionary that supports updates, membership queries, and perfect 
hashing with hashcodes in [n+t] (and therefore also retrieval queries) in constant time per operation, 
using 0(n lg ^ + n lg^py) bits of space. The query and space complexities are worst-case, while 
updates are handled in constant time with high probability. 

In the interest of Theorems El and El we develop a family of quotient hash functions. These hash 
functions are permutations of the universe; they and their inverses are computable in constant time 
given a small-space representation; and they have natural distributional properties when mapping 
elements into buckets. (In contrast, we do not know any hash functions with these properties and, 
say, 4-wise independence.) These hash functions may be of independent interest. 

Table □ summarizes our completed understanding of the optimal space bounds for dynamic 
dictionaries supporting updates and any combination of the three types of queries in constant time 
with high probability. All upper bounds are new, as are the lower bounds for perfect hashing with 
or without membership. 

1.2 Previous work 

There is a huge literature on various types of dictionaries, and we do not try to discuss it exhaus¬ 
tively. A milestone in the history of constant-time dictionaries is the realization that the space 
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and query bounds can be made worst case (construction and updates are still randomized). This 
was achieved in the static case by Fredman, Komlos, and Szemeredi J2j with a dictionary that 
uses 0(n lgit) bits. Starting with this work, research on the dictionary problem evolved in two 
orthogonal directions: creating dynamic dictionaries with good update bounds, and reducing the 
space. 

In the dynamic case, the theoretical ideal is to make updates run in constant time per operation 
with high probability. After some work, this was finally achieved by the high-performance dictio¬ 
naries of Dietzfelbinger and Meyer auf der Heide @j. However, this desiderate is usually considered 
difficult to achieve, and most dictionary variants that have been developed since then fall short of 
it, by having amortized and/or expected time bounds (not with high probability). 

As far as space is concerned, the goal was to get closer to the information theoretic lower bound 
of log 2 (“) bits for membership. Brodnik and Munro {2J were the first to solve static membership 
using O(relg^) bits, which they later improved to (1 + o(l))log 2 (“). Pagh TIj solves the static 
dictionary problem with space log 2 (“) plus the best lower-order term known to date. For the dy¬ 
namic problem, the best known result is by Raman and Rao m, achieving space (l + o(l)) log 2 (“). 
Unfortunately, in this structure, updates take constant time amortized and in expectation (not with 
high probability). These shortcomings seem inherent to their technique. 

Thus, none of the previous results simultaneously achieve good space and update bounds, a 
gap filled by our work. Another shortcoming of the previous results lies in the understanding 
of dynamic dictionaries supporting perfect hashing. The dynamic perfect hashing data structure 
of Dietzfelbinger et al. 0 supports membership and a weaker form of perfect hashing in which 
hashcodes are not stable, though only an amortized constant number of hashcodes change per 
update. This structure achieves a suboptimal space bound of 0{n\gu) and updates take constant 
time amortized and in expectation. No other dictionaries can answer perfect hashing queries except 
by associating an explicit hashcode with each element, which requires 0(nlgn) additional bits. 
Our result for membership and perfect hashing is the first achieving O(n lg ) space, even for weak 
update bounds. A more fundamental problem is that all dynamic data structures supporting perfect 
hashing use fl(nlg^) space, even when we do not desire membership queries so the information 
theoretic lower bound does not apply. 

Perfect hashing in the static case has been studied intensely, and with good success. There, 
it is possible to achieve good bounds with t = 0, and this has been the focus of attention. When 
membership is required, a data structure using space (1 + o(l)) lg (“) was finally developed by m- 
Without membership, the best known lower bound is nlog 2 e + lglgu + O(lgn) bits jUj, while the 
best known data structure uses n log 2 e + lg lg u + 0(n ^ s ^^ + lg lg lg it) bits 0|. Our lower bound 
depending on t shows that in the dynamic case, even t = 0(n 1_e ) requires fl(nlgn) space, making 
the problem uninteresting. Thus, we identify an interesting hysteresis phenomenon, where the 
dynamic nature of the problem forces the data structure to remember more information and use 
more space. 

Retrieval without membership was introduced as “Bloomier filters” by Chazelle et al. p]. The 
terminology is by analogy with the Bloom filter, a static structure supporting approximate mem¬ 
bership (a query we do not consider in this paper) in O(nlg^) space. Bloomier filters are static 
dictionaries supporting retrieval using 0(nr + lg lg n) bits of space. For dynamic retrieval of r = 1 
bit without membership, Chazelle et al. [3] show that H(nlglgit) bits of space can be necessary in 
the case n 3+e < u < 2 n ° (1) . Their bound is improved in m, giving Theorem On the upper- 
bound side, the only previous result is that of m- dynamic perfect hashing for t = 0(n/lgit) 
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using space 0{n\g\gu). Our result improves lglgit to lglg and offers the full tradeoff depending 
on t. 

1.3 Details of the model 

A few details of the model are implicit throughout this paper. The model of computation is the 
Random Access Machine with cells of Igu bits (the word RAM). Because we ignore constant factors, 
we assume without loss of generality that u , t, and b are all exact powers of 2. 

In dynamic dictionaries supporting perfect hashing, n is not the current size of the set S, but 
rather n is a fixed upper bound on the size of S. Similarly, t is a fixed parameter. This assumption 
is necessary because of the problem statement: hashcodes must be stable and the hashcode space 
is defined in terms of n and t. This assumption is not necessary for retrieval queries, although we 
effectively assume it through our reduction to perfect hashing. Our results leave open whether a 
dynamic dictionary supporting only retrieval can achieve space bounds depending on the current 
size of the set S instead of an upper bound n; such a result would in some sense improve the first 
row of Table |U 

On the other hand, if we want a dynamic dictionary supporting membership but not perfect 
hashing (but still supporting retrieval), then we can rebuild the data structure whenever IS) changes 
by a constant factor, and change the upper bounds n and t then. This global rebuilding can be 
deamortized at the cost of increasing space by a constant factor, using the standard tricks involving 
two copies of the data structure with different values of n and t. 

Another issue of the model of memory allocation. We assume that the dynamic data structure 
lives in an infinite array of word-length cells. At any time, the space usage of the data structure 
is the length of the shortest prefix of the array containing all nonblank cells. This model charges 
appropriately for issues such as external fragmentation (unlike, say, assuming that the system 
provides memory-block allocation) and is easy to implement in practical systems. See |13j for a 
discussion of this issue. 

Finally, we prove that our insertions work in constant time with high probability, that is, with 
probability 1 — l/n c for any desired constant c > 0. Thus, with polynomially small probability, the 
bounds might be violated. For a with-high-probability bound, the data structure could fail in this 
low-probability event. To obtain the bounds also in expectation and with zero error, we can freeze 
the high-performance data structure in this event and fall back to a simple data structure, e.g., a 
linked list of any further inserted elements. Any operations (queries or deletions) on the old elements 
are performed on the high-performance data structure, while any operations on new elements (e.g., 
insertions) are performed on the simple data structure. The bounds hold in expectation provided 
that the data structure is used for only a polynomial amount of time. 

2 Quotient Hash Functions 

We define a quotient hash function in terms of three parameters: the universe size u, the number 
of buckets b, and an upper bound n on the size of the sets of interest. A quotient hash function is 
simply a bijective function h : [u] —> [b] x [^]. We interpret the first output as a bucket, and the 
second output as a “quotient” which, together with the bucket, uniquely identifies the element. We 
write h(x)i and h(x )2 when we want to refer to individual outputs of h. 

We are interested in sets of elements S C [u] with IS) < n. For such a set S and an element x, 


5 


define Bh(S,x) = {y E 5 | h(y)\ = h(x) i}, i.e. the set of elements mapped to the same bucket as 
x. For a threshold t, define Ch(S,t ) = {x G S \ ffBh(S,x) > t}, i.e. the set of elements which map 
to buckets containing at least t elements. These are elements that “collide” beyond the allowable 
threshold. 

Theorem 5. There is an absolute constant a < 1 such that for any u, n and b, there exists a family 
of quotient hash functions TL = {h : [ u] —> [6] x [)(]} satisfying: 

• an h ZzTL can be represented in 0(n a ) space and sampled in 0(n a ) time. 

• h and h _1 can be evaluated in constant time on a RAM; 

• for any fixed S C [it], |<Sj < n and any 5 < 1, the following holds with high probability over the 
choice of h: 

I if b > n, #C h (S, 2) < 2^ + n“ 

\ if b < n, #C h (S,(l + <S)£ + 1) < 2ne~ 52n /W +n a 

It is easy to get an intuitive understanding of these bounds. In the case b > n, the expected 
number of collisions generated by universal hashing (2-independent hashing) would be A-. For 
b < n, we can compare against a highly independent hash function. Then, the expected number of 
elements that land in overflowing buckets is ne~ s2n ^ 3b \ by a simple Chernoff bound. Our family 
matches these two bounds, up to a constant factor and an additive error term of 0(n a ), which 
are both negligible for our purposes. The advantage of our hash family is two-fold. First, it gives 
quotient hash functions, which is essential for our data structure. Second, the number of overflowing 
elements is guaranteed with high probability, not just in expectation. 

2.1 Permutation hash functions 

It is useful to relate the concept of quotient hash functions to another concept, namely permutation 
hash functions. Such functions are bisections from the universe to itself. We call a family of 
permutations k-independent if, for any input set S C [u] with at most k elements, the output of a 
randomly drawn hash function applied to the elements of S is indistinguishable from the output of 
a truly random permutation. 

It is easy to construct 2-independent permutations. A standard construction for universal 
hashing is to map x i— > ax + b, where a and b are random, and all elements come from a field 
Z u . Maintaining the same family with the restriction a / 0 gives 2-independent permutations. 
Unfortunately, it is not known how to construct good k-wise independent families, for k > 3. In 
fact, small families are not even known to exist. If one is satisfied with almost fc-wise independence, 
small families can be constructed (see [2j and citations therein), but it is not known how to achieve 
constant evaluation time. 

A permutation hash function can be converted trivially into a quotient hash function: the bucket 
is given by some lg 6 bits of the output, and the quotient is the rest. To achieve bounds similar 
to that of Theorem [SJ all we need is a family of permutations with sufficiently high independence, 
but as mentioned already, we do not know how to construct one. The trick behind Theorem El is to 
recognize that permutation families, though theoretically clean and interesting, are stronger than 
what we need. We can instead get away with slightly weaker concentration guarantees. One should 
note that the family of hash functions we propose does not have any ^-independence guarantee. 
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2.2 Construction of the hash family 

It will be convenient to assume that u < n c , for some constant c. If this does not hold, we can 
reduce the universe to n c , for some big enough c, by the following: 

Transformation 1. Apply a random 2-independent permutation on the original universe. Keep 
only the first c lg n bits of the result , and make the rest part of the quotient. 

The expected number of collisions generated by keeping only clgn bits is bounded by n 2-c . 
Thus, by choosing a large enough constant c, we can avoid any collision with the required high- 
probability guarantee. 

From now on, it will be convenient to interpret the universe as a two-dimensional table, with 
n 3 / 4 columns, and rows. The plan is to use this column structure as a means of generating 
independence. Imagine a hash function that generates few collisions in expectation, but not neces¬ 
sarily with high probability. However, we can apply a different random hash function inside each 
column. The expectation is unchanged, but now Chernoff bounds can be used to show that we are 
close to the expectation with high probability, because the behavior of each column is independent. 

However, to put this plan into action, we need to guarantee that the elements of S are spread 
rather uniformly across columns. We do this by applying a random circular shift to each row. 

Transformation 2. Consider a highly independent hash function mapping row numbers to [n 3 / 4 ]. 
Inside each row, apply a circular shift by the hash function of that row. 

Note that the number of rows can be pretty large (larger than to), so we cannot afford a truly 
random shift for each row. However, the number of rows is polynomial, and we can use Siegel’s 
family of highly independent hash functions ca to generate highly independent shifts. These 
hash functions can be represented in space take constant time to evaluate, and are n^ 1 )- 

independent with high probability. 

Because shifts are uniform, the expected number of items in each column is ^4 < n 1 / 4 . We 
can apply Chernoff bounds for random variables with limited independence to show that a column 
does not have more than n 1//4 + n 1//5 elements with very high probability. Specifically, we apply m 
Theorem 5.1]. Let X % be the indicator random variable, specifying whether an element from S is 
mapped to our column of interest in row i. Then p = E[fT Xf = n 1 / 4 . We are interested in the 

event > p + n 1 ^. The theorem guarantees that, for independence k = 0 ( ^ ^ ) = 0(?r°' 15 ), 

the probability of this event is e~^( k \ Because Siegel’s hash functions give k = this event 

happens with exponentially small probability. 

In the remainder, we consider only n 1 / 4 elements of S from each column. In the worst case, all 
the excess values from S will be mapped to buckets that are already full due to these “normal” 
elements. However, the total number of excess elements is at most n 3 / 4 • n 1//5 = n 0 ' 95 with high 
probability, and this is swallowed by our n a error term. 

The case b > n. Remember that, in this case, our goal is to get close to the collisions generated 
by a 2 -independent permutation, but with high probability. As explained above, we can achieve 
this effect through column independence. 

Transformation 3. Apply a 2-universal permutation inside each column. For each column, the 
permutation is chosen independently at random. 
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We have only n 3 / 4 columns, and a 2-universal permutation takes O(lgn) bits to represent, so 
we can afford to store all these permutations. To complete the construction, break each column 
into ^74 equal-sized buckets. The position within a bucket is thrown away as part of the quotient. 

Using a classic Chernoff bound, we show that imposing the bucket granularity does not generate 
too many collisions. Let Xf be the number of elements in column i that are mapped to the 
same table cell as some other element. Because we look at only n 1 / 4 elements in each column, 
Xj £ [0,n 4 / 4 ]. Let fi = E[Y^Xi\; by linearity of expectation, p < \. By the Chernoff bound, 

PrE Xi >2p + n 3 / 4 ] < exp(—( ^+ 1/4 )) = e Thus, with exponentially high probability, 

the number of collisions is bounded by + n 3 / 4 . This completes the analysis. 

The case b < n. Ideally, each bucket should contain f elements. We are interested in buckets 
of size exceeding (1 + 6)j, and want to bound the number of elements in such buckets close to the 
expected number for a highly independent permutation. 

As explained already, we do not know any family of highly independent permutations that can 
be represented with small space and evaluated efficiently. Instead, we will revert to the brute-force 
solution of representing truly random permutations. To use this idea and keep the space small, we 
need two tricks. The first trick is to generate and store fewer permutations than columns. It turns 
out that re-using the same permutation for multiple columns still gives enough independence. 

Note, however, that we cannot even afford to store a random permutation inside a single column, 
because columns might have more than n elements. However, we can reduce columns to size y / n 
as follows. Use Transformation [31 with b' = n 5 / 4 . This puts elements into n 5 / 4 first-order buckets 
(y/n buckets per column), with a negligible number of collisions. Thus, we can now work at the 
granularity of first-order buckets, and ignore the index within a bucket of an element. 

Transformation 4. Group columns into n 1 / 4 equal-sized groups. For each group, generate a 
random permutation on yfn positions, and apply it to the first-order buckets inside each column of 
the group. 

The space required to represent the permutations is 0(n 3 / 4 lgn) bits. To complete the con¬ 
struction, we just break columns into ^74 equal-sized buckets. In other words, first-order 
buckets are grouped into an output bucket. If b < n 3 / 4 , we are already done, since it suffices to 
know elements are well distributed in columns. 

We begin by analyzing the probability that a fixed element x ends up in an overflowing bucket. 
The events that other elements end up in the same bucket are negatively correlated (because we 
have permutations). Thus, we can upper bound the probability by assuming that the other n 1//4 
elements can be independently mapped to re’s bucket with probability Then, by the Chernoff 
bound, the probability that (1 + <5)^ other elements get mapped to re’s bucket is at most e ~ s2n /( 3b ) m 
This is the probability x that is in an overflowing bucket. 

Now let Xi be the number of elements that overflow in column group i. We have Xi £ 
[0,n 3 / 4 ] and E\YfXi] = p < ne - s2n /( 3b ). Then, by the Chernoff bound, Pr|y~) Xj > 2p + n a ] < 
exp(—)) = e - nn(1) if a > |. This completes the analysis. 

2.3 Coping with a dynamic set 

We now discuss an important subtlety in the probabilistic analysis needed by our data structure. 
In general, we are dealing with a dynamic set S, and mapping it through a quotient hash function. 




The analysis above shows that, for any fixed S, the number of overflow elements is small. However, 
imagine an element x that becomes an overflow element at the time of insertion. Then, as some 
other element y is deleted, x might not be in an overflow condition in the old set, but our data 
structure has already handled it as an overflow element. Thus, it does not suffice to look at the 
overflow conditions for a fixed set S. The relevant quantity is the number of elements that caused 
an overflow at the time of insertion. 

Fortunately, the same bounds from above hold, were n is an upper bound on the size of S. 
This follows from the same analysis, but interpreted in a more subtle way. The probability that an 
insertion causes an overflow is unchanged. By linearity of expectation (over the elements currently 
in the set S), the expected number of elements that cause an overflow when inserted is the same 
as the expected bounds from above. But whether an element causes an overflow depends only 
on the random coins pertaining to the column of the element. Thus, by the same independence 
considerations as above, this expectation is matched with high probability. 

3 Solution for Membership and Perfect Hashing 

There are two easy cases. First, if u = f2(n 1-5 ), then the space bound is 0(nigit). In this case, a 

solution with hashcode range exactly [n] can be obtained by using a high-performance dictionary [3J. 

We store an explicit hashcode as the data associated with each value, and maintain a list of free 

hashcodes. This takes 0(nlgn + nlgu) = O(nlgit) bits. Second, if t = 0(n a ), for a < 1, then the 

space bound is ©(nlgn). Because u = 0(n L5 ), we can use the same brute-force solution. In the 

2 

remaining cases, we can assume t < (we are always free to decrease t), so that the space bound 
is dominated by @(nlg j). 

The data structure is composed of three levels. An element is inserted into the first level that 
can handle it. The first-level filter outputs hashcodes in the range [n + |], and handles most 
elements of S: at most c\t elements (for a constant c\ < ^ to be determined) are passed on to the 
second level, with high probability. The goal of the second-level filter is to handle all but O(j^) 
elements with high probability. If c\t < this filter is not used. Otherwise, we use this filter, 
which outputs hashcodes in the range [|], Finally, the third level is just a brute-force solution 
using a high-performance dictionary. Because it needs to handle only min{0(j^), c\t} elements, 
the output range can be [|] and the space is 0(n ) bits. This dictionary can always be made to 
work with high probability in n (e.g. by inserting dummy elements up to Q(y/n) values). 

A query tries to locate the element in all three levels. Because all levels can answer membership 
queries, we know when we’ve located an element, and we can just obtain a hashcode from the 
appropriate level. Similarly, deletion just removes the element from the appropriate level. 

3.1 The first-level filter 

Let fi = C 2 (j) 3 , for a constant C 2 to be determined. We use a quotient hash function mapping 
the universe into b = ^ buckets. Then, we expect /r elements per bucket, but we will allow for 

an additional elements. By Theorem [SJ the number of elements that overflow is with high 
probability at most ne - ^^ + n a . For big enough C 2 , this is at most (remember that we are 
in the case when n a is negligible). 

Now we describe how to handle the elements inside each bucket. For each bucket, we have a 
hashcode space of [// + // 2 / 3 ]. Then, the code space used by the first-level filter is n + -7p= < n+ 1 for 
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big enough C2 . We use a high-performance dictionary inside each bucket, which stores hashcodes as 
associated data. We also store a list of free hashcodes to facilitate insertions. To analyze the space, 
observe that a hashcode takes only 0(lg j) bits to represent. In addition, the high-performance 
dictionary need only store the quotient of an element. Indeed, the element is uniquely identified 
by the quotient and the bucket, so to distinguish between the elements in a bucket we only need a 
dictionary on the quotients. Thus, we need 0(lg = 0(lg ^ + lg j) bits per element. 

The last detail we need to handle is what happens when an insertion in the bucket’s dictionary 
fails. This happens with probability [x~ Cz for each insertion, where C3 is any desired constant. We 
can handle a failed insertion by simply passing the element to the second level. The expected 
number of elements whose insertion at the first level failed is n/* _C3 < ^-t for big enough C3. Since 
we can assume t = hl(n 5 / 6 ), we have /* = 0(^/n) and b = This means we have Q(y/n) 

dictionaries, which use independent random coins. Thus, a Chernoff bound guarantees that we are 
not within twice this expectation with probability at most = e _nn(1) because t = H(?* a ). 

Thus, at most cit elements in total are passed to the second level with high probability. 

3.2 The second-level filter 

We first observe that this filter is used only when lg ^ = 0(\/lgn). Indeed, t < so when 
lg ^ = fl(-^lgn), we have t = o(^), and we can skip directly to the third level. 

We use a quotient hash function mapping the universe to b = ~^= buckets. We allow each 
bucket to contain up to 2-^lgn elements; overflow elements are passed to the third level. By 
Theorem El at most n/2 ^ ^ lgn ) = o(n/lgn) elements are passed to the third level, with high 
probability. 

Because buckets contain O(-tylgn) elements of 0(\/\gn) bits each, we can use word-packing 
tricks to handle buckets in constant time. However, the main challenge is space, not time. Observe 
that we can afford only 0(lg bits per element, which can be much smaller than 0{\/ lgn). This 
means that we cannot even store a permutation of the elements inside a bucket. In particular, it is 
information-theoretically impossible even to store the elements of a bucket in an arbitrary order! 

Coping with this challenge requires a rather complex solution: we employ O(lglgn) levels of 
filters and permutation hashing inside each bucket. Let us describe the level-* filter inside a bucket. 
First, we apply a random permutation to the bucket universe (the quotient of the elements inside 
the bucket). Then, the filter breaks the universe into C4 ^ i sn equal-sized tiles. The filter consists of 
an array with one position per tile. Such a position could either be empty, or it stores the index 
within the tile of an element mapped to that tile (which is a quotient induced by the permutation 
at this level). Observe that the size of the tiles doubles for each new level, so the number of entries 
in the filter array halves. In total, we use h = ilglgn filters, so that the number of tiles in any 
filter is L!(^Ign). 

Conceptually, an insertion traverses the filters sequentially starting with i = 0. It applies 
permutation i to the element, and checks whether the resulting tile is empty. If so, it stores the 
element in that tile; otherwise, it continues to the next level. Elements that cannot be mapped in 
any of the h levels are passed on to the third level of our big data structure. A deletion simply 
removes the element from the level where it is stored. A perfect-hash query returns the identifier 
of the tile where the element is stored. Because the number of tiles decreases geometrically, we use 
less than 2c4 vdg n hashcodes per bucket. We have buckets in total and we can make ci as 

small as we want, so the total number of hashcodes can be made at most 
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We now analyze the space needed by this construction. Observe that the size of the bucket 
universe is v = u ■ v c ^ t n . Thus, at the first level, the filter requires lg — bits to store an index 
within each tile. At each consecutive level, the number of bits per tile increases by one (because 
tiles double in size), but the number of tiles halves. Thus, the total space is dominated by the first 
level, and it is 0 (lg j) = 0( lg ^ + lg j) bits per element. 

Bounding the unfiltered elements. We now want to bound the expected number of elements in 
a bucket that are not handled by any of these filters. In doing so, we assume that the permutation 
hash functions at each level are truly random; this will be justified later. We assume that, at 
level i, there are at most elements that haven’t been handled by previous levels (for C 5 to be 

determined). When analyzing filter i, there is a probability that this condition will not be met at 
level i + 1. If this happens, we just assume pessimistically that all remaining elements are rejected 
by all filters. However, the probability of this event will be small enough, so the total expectation 
of the number of rejected elements will be small. 

At level i there are C4 tiles, so an element has probability at most A-2 _ ( C5_1 )* 0 f conflicting 

with another element. Then the expected number of conflicting elements is = 4 ^ gn 2~^ 2c5 ~ 1 ^. 
Conflicts are negatively correlated, so we can apply a Chernoff bound on the number of elements 
not handled at this level. We want to analyze the case when there are more than (1 + 5)/i = ^ c 
unhandled elements (which invalidates our assumption at level i + 1). We have 1 + 5 = 2C c 5 \i 2 l C 5 ~ 1 b. 

By Chernoff bounds, the probability of this event is at most (e 5 (l + 5W 1 +< 5 1) < ( J . We 

now distinguish three ranges for i: 

• As long as (1 + 5)fi > -ydgn, observe that, for any C 5 , C 4 can be chosen large enough so that 

1 + 5 > 2e, so the probability of the bad event is Thus, the contribution of this 

bad event to the total expectation is o(^), even summed over all levels. 

• As long as (1 + 6)fi > 16, the probability is bounded by (yr^) 16 . Note that since (1 + 8)fi < 
v+gn, we have 1 + 5 = hi(vdgn). Then, the failure probability is 0(y4—), so the contribution 

lg n 

to the total expectation is o(jg^), even summing over all levels. 

• If (1 + 8)fj, < 16, we only have to handle 0(1) elements. The probability of any conflict 
is inversely proportional to the number of tiles, which is II(ydg n). The probability of a 
persistent conflict decreases exponentially with the number of levels, so after 0 ( 1 ) levels, we 
expect o(j/^) conflicts. For large enough C 5 , (1 + 5)/j, decreases sufficiently rapidly that we 
will actually have the required 0 ( 1 ) levels in this range. 

Implementation details. We now discuss how to implement the conceptual ideas from above, 
and make operations take constant time. Note that \gv = O (ydg n). Thus, the output of O(lglgn) 
permutation functions takes 0(v+gn • lglgn) bits, and easily fits in a word. For now, assume that 
we can evaluate the output of all these functions in constant time. The description of an entire 
bucket takes 0 (v / Tgn-lg ^) = 0 (\/lgn) bits, so a bucket can also be manipulated as a word. Then, 
updates and queries of a bucket are functions on a range of 0(\/\gn ■ lglgn) bits for the argument, 
and 0(\/lgn) bits for the bucket representation. So they can be implemented in constant time 
using a lookup table of size exp( 0 (lg 3//4 n ■ lglgn)) • O(lgn) = n° ^ bits. 
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It remains to implement the random permutations. We can simply generate O(lglgn) random 
permutations on the universe [u], and create a lookup table with the packed output of all permu¬ 
tations, applied to each possible input. This takes space exp(0(//lgn)) • O(lgra) bits. We cannot 
afford such a collection of random permutations for each bucket, but we can re-use just a few ran¬ 
dom permutations. In particular, we can generate n 2 / 3 independent lookup tables, and use each 
one for an equal share of the buckets. Then, the number of elements that get passed to the third 
level because they are not handled by any filter inside their bucket is the sum of n 2//3 independent 
components. By the Chernoff bound, the expectation (analyzed above) is exceeded by a constant 
factor only with exponentially small probability. 


4 Solution for Perfect Hashing 


The data structure supporting perfect hashing but not membership consists of one quotient hash 

function, selected from the family of Theorem 0 and two instances of the data structure of Theo- 

rem0supporting perfect hashing and membership. The quotient hash function divides the universe 

2 

into b buckets, and we set b = for a constant c > 1 to be determined. 

The first data structure supporting perfect hashing and membership stores the set B of buckets 
occupied by at least one element of S. An entry in B effectively represents an element of S that is 
mapped to that bucket. However, we have no way of knowing the exact element. The second data 
structure supporting perfect hashing and membership stores the additional elements of S, which 
at the time of insertion were mapped to a bucket already in B. 

Insertions check whether the bucket containing the element is in B. If not, we insert it. Other¬ 
wise, we insert the element into the second data structure. Deletions proceed in the reverse order. 
First, we check whether the element is listed in the second data structure, in which case we delete 
it from there. Otherwise, we delete the bucket containing the element from the first data structure. 

The range of the first perfect hash function should be [n + |]. For the second one, it should 
be [|]; we show below that this is sufficient with high probability. Thus, we use [n + t] distinct 
hashcodes in total. To perform a query, we first check whether the element is listed in the second 
data structure. If it is, we return the label reported by that data structure (offset by n + | to avoid 
the hashcodes from the first data structure). Otherwise, because we assume that the element is 
in S, it must be represented by the first data structure. Thus, we compute the bucket assigned 
to the element by the quotient hash function, look up that bucket in the first data structure, and 
return its label. 

It remains to analyze the space requirement. We are always free to reduce t, so we can assume 
t = 0(n/ lg ^), simplifying our space bound to 0(n lg jtj)- Because \B\ < n, the first data structure 
needs space 0 (lg Q) + nig t /£ +1 ) = 0 (nlg ^ + nig = 0 (nlg ppp), which is within the desired 
bound. 

Because b > n, our family of hash functions guarantees that, with high probability, the number 
of elements of S that were mapped to a nonempty bucket at the time of their insertion is at most 
r ^Y~ + n a = + n a . If n a < |, this is at most | for sufficiently large c. If t = 0(n a ), we can 

use a brute-force solution: first, construct a perfect hashing structure with t = n (this is possible 
through the previous case); then, relabel the used positions in the [ 2 n] range to a minimal range 
of [n], using O(nlgn) memory bits. 

Given this bound on the number of elements in the second structure, note that the number of 
hashcodes allowed (i/2) is double the number of elements. Thus the space required by the second 
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structure is 0(lg + t) = 0(t lg f) = 0{ lg n/lg u [u/n) ) = 0(n). 


5 Lower Bound for Perfect Hashing 

This section proves Theorem [21 assuming u > 2 n. We defer case of smaller u to the full version. 

Our lower bound considers the dynamic set S which is initially {n + 1,..., 2n} and is transformed 
through insertions and deletions into {1,... ,n}. More precisely, we consider ^ stages. In stage i, 
we pick a random subset Di C S n {n + 1,..., 2n}, of cardinality 2 1. Then, we delete the elements 
in Di , and we insert elements = {(* — 1)2 1 + 1,... ,i ■ 2 1}. Note than, in the end, the set is 
{1,... ,n}. By the easy direction of Yao’s minimax principle, we can fix the random bits of the 
data structure, such that it uses the same expected space over the input distribution. 

Our strategy is to argue that the data structure needs to remember a lot of information about 
the history, i.e. there is large hysteresis in the output of the perfect hash function. Intuitively, 
the 2 t elements inserted in each stage need to be mapped to only 3 1 positions in the range: the t 
positions free at the beginning of the stage, and the 2 t positions freed by the recent deletes. These 
free positions are quite random, because we deleted random elements. Thus, this choice is very 
constrained, and the data structure needs to remember the constraints. 

Let h be a function mapping each element in [2n] to the hashcode it was assigned; this is well 
defined, because each element is assigned a hashcode exactly once (though for different intervals of 
time). We argue that the vector of sets (h(/i),..., h(I n / 2t )) has entropy f2(nlg j). One can recover 
this vector by querying the final state of the data structure, so the space lower bound follows. 

We first break up the entropy of the vector by: H(h(Ii),... ,h(I n / 2t )) = H(h(Ij) \ ..., h(Ij-i)). 

Now observe that the only randomness up to stage j is in the choices of D i,..., Dj_\. In other 
words, Di ,... ,-Dj-i determine h(h ),..., h(Ij-i). Then, H(h(h ),..., h(I n / 2t )) > H{h(Ij) \ 

D i,..., Dj_ i). To alleviate notation, let D be the vector (D \,..., Dj_ i). 

Now we lower bound each term of the sum. Let Fj be the set of free positions in the range 
at the beginning of stage j. Because we made the data structure deterministic, Fj is fixed by 
conditioning on D K j. Because Ij can be mapped to free positions only after Dj is deleted, we find 
that h(Ij) C Fj U h{Dj). Note that \h(Ij)\ = 2 1, but \Fj\ = t. Thus, | h(Dj) \ h(Ij)\ < t. 

Now we argue that the entropy of h(Dj) is large. Indeed, Dj is chosen randomly from STl{n + 

1,..., 2n}, a set of cardinality n — 2 t(j — 1). Conditioned on D <J . the set S n {n + 1,..., 2n} is 
fixed, so its image through h is fixed. Then, choosing Dj randomly is equivalent to choosing h(Dj) 
randomly from a fixed set of cardinality n — 2 t(J — 1). So H{h{Dj) \ D <] ) = lg ( n-2 ^~ 1 . Now 
consider h(Dj) \ h(Ij). This is a set of cardinality at most t from the same set of n — 2t{j — 1) 
positions. Thus, H(h(Dj ) \ h(Ij) \ D K j) < lg ( n-2 *0'- 1 )) _)_ ^ 

Using H{a, b ) < H(a) + H(b), we have 

H{h(Dj) | D Kj ) < H(h(Dj) n h(Ij) \ D<j) + H(h(Dj ) \ h{Ij) \ D<j). 


Of course, H(h(Ij ) | D K j) > H(h(Ij) n h(Dj ) | D K j). This implies 


H(h{Ij) | D<j) > H(h{Dj) | D<j) - H(h{Dj) \ h{Ij) , 


> lg 


n - 2 t(j - 1) 
2 1 


- lg 


D 

n - 2t(j - 1) 
t 


) 

— t. 


Using (b)/(“) = (V)> we have I D <j) > lg( n t{ lt X) ) “ For 3 < fv we have 
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H(h(Ij) | D K j) = Q(t lgj)- We finally obtain H(h(I\),... ,h(Ij)) = Q(n lgf), concluding the 

proof. 
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